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ABSTRACT 

The power of the t, expected normal scores, 
Mann-Whitney U, Tukey, a modified Mann-Whitney U, and 'an adaptive 
procedure were investigated when sampling from population models 
empirically developed from test score distributions. The models used 
w^re selected members of the beta family. This investigation was 
unique in that not only did the means of the alternative 
distributions increase under change in location parameter, but the 
shape of the distribution changed as well. In general, the t 
statistic displayed superior power over the other procedures. Closely 
behind t were the expected normal scores and Mann-Whitney U 
procedures with the others following. (Author) 
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INTRODUCTION 

Frequently, educational and psychological researchers are con- 
fronted with the problem of determining \^ilether two independent sam- 
ples come from the same population. The major purpose of this study 
was to investigate the small sample power and the control of Type I 
error of selected two-sample statistical procedures when saiipling 
from '^'^oulations encountered in educational and psychological research. 
For such research, test scores are often used as the criterion measure 
reflecting the performance of an individual belonging to a target pop- 
ulation or an experimentally accessible population. 

Numerous statistical procedures have been proposed to detect dif- 
ferences in central tendency between two populations when independent, 
random san^les are drawn from each. The two-sample statistical pro- 
cedures investigated were: the Student's t-test (t) , the Mann-Whitney 
U test (U), the Terry -Hoe ff ding normal scores test (S), a Tukey quick 
test (T) as developed by Tukey (19S9) , a modified Mann-Whitney U test 
(W) was developed by Randies and Hogg (1972), and an adaptive two- 
sample nonparajnetric procedure (A) also developed by Randies and Hogg 
(1972). 

The modified Mann-lVhitney U test was a Mann-Whitney U statistic 
based upon the [(N+l)/4]* largest observations and the [(N+l)/4]* 



*[p] denotes the greatest integer contained in p. 
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smallest observations in the combined sample. The adaptive procedure 
used in this research was a modification of a procedure described by 
Randies and Hogg (1972). The adaptive scheme classified the under- 
lying distribution as having either light or non-light tails and a 
modified Mann-lVhitney U or Mann-Whitney U statistic was used accord- 
ingly. The criterion for classification was the range of the sample 
divided by the mean deviation from the sample median. 

The t-test and its prijnar>' nonparametric competitor, the Mann- 
ViTiitney U (or the two-sanple Wilcoxon) , have been extensively research- 
ed with regards to power and control of Type I error. >fost of this 
research has concentrated on sairpling from populations whose forms are 
similar to well known theoretical distributions such as normal, uniform, 
exponential, logistic, double exponential, etc., and has considered 
that the two populations differ in location parameter and/or scale 
parameter. However, in educational and psychological research two 
crucial questions to raise are: 

1. Haw often are the underlying population distributions really 
normal, uniform, exponential, logistic, double exponential, . 
etc. 

2. How well do the various two-sample statistical procedures 
detect differences between two populations ^vhen sampling 
from population distribution types that exist in the field 
of educational and psychological research? 
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CONSTRUCTION OF NDDEIJ3 
In this investigation the following three considerations were 
taken into account in the construction of population distribution mo- 
dels. 

1. riie population models should exemjolify test score distri- 
butions that frcciuently occur in educational and psycho- 
logical research. 

1, The population models should be bounded because test 
score distributions are bounded at both the upper and 
lower ends . 

3. Since test score distributions are bounded, any change 
in central tendency from one dist-ribution to another is 
likely to change other characteristics as well, such as 
skewness and kiirtosis. 
Descriptive statistics for raw score distributions of the Iowa 
Tests of Basic Skills and the Iowa Tests of Educational Development 
based upon National and Iowa norms were provided by Brandenburg (1972). 
IVhile these data indicate that distributions of raw scores from 
standardization populations tend to be nearly symmetrical and very 
light-tailed, most educational and psychological research is not 
conducted by a sampling from such populations. Instead, a more com- 
mon nractice i.s to use students from an intact classroom, building, 
or school system, and randomly divide them into experimental and 
control groups. Data from the files of the Iowa Testing Programs in- 
clufling 122 classes (N^iO) , 41 buildings (30<N<^90) and 16 systems 
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re the basis for the construction of the models. A few in- 
n^; generalizations drusm from the data were: 

1. rhe smnple means and standard deviations were curvi- 
I linearly related. 

2. The sample means and measures of skewness were lin- 
early related. 

3. The sanii3le means and measures of kurtosis were aiwi- 
linearly related. 

4. The measures of- skewness and kurtosis were cui-vilinear- 
ly related. 

5. The above relationships held for classes, buildings, and 
systems. 

6. The majority of test score distributions were light - 
tailed and positively skewed. 

Five beta distribution models were selected to represent these popula- 
tion distributions. Tliese beta models, along with summary descriptive 
statistics including skeuness (^''^■j^) and k"urtosis (&2^ ^ illustrated 
in Figure 1. 

r^ROCEDlM: 

To generate sanpies froii: tlie fivr beta distributions rapidly and 
efficiently an algorithm based on the inverse of the generalized lainda 
distribution, as developec: by Schmeiser (J 971), was used. Sf'unples of 
size (5,5), (5,10), (10,5), (10,10), and (20,20) were investigated. 
For each combination of sa^rple sizes, two empirical power functions 
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were obtained. 'Hie first usint; a beta distribution with u = .275 
(distribution I of Figure 1) as the X-distribution sajnpling model 
with successive Y-distribution sampling models being those of distri- 
butions I through V of Figure 1. Tlie second using a beta distribution 
with u = .3875 (distribution II of Figure 1) as the X-distribution 
sampling model with successive Y-distribution sampling models of II 
through V of Figure 1. 

RESUIJS 

In the evaluation of any hyj^othesis testing procedure, the control 
over Type T error must be given careful consideration. The empirical 
Type I error rates are given in columns one and sbc of Table I. For 
the exact tests (all except t), the majority of empirical significance 
value'' were within expected binomial variation levels (Op = .007 for 
p = .05 and N = 1000). Only the enpirical values obtained at the .05 
level of significance are shown in the table. Similar results were 
obtained at the other significance levels of .01 and .10. 

Even though the t -test is not an exact procedure for these distri- 
butions, close agreement between enpirical and nominal levels was ob- 
served. An exception was its being somewhat erratic in the instances 
where the saniple sizes were unequal. Similar results for the t stat- 
istic have been reported by other investigators and along with those 
given here confirm the apparent Type I error robustness of the t stat- 
istic for such non-nonnaJ population models. 

Overall, the t-test exhibited the greatest empirical power in the 
situations investigated."^ For small, equal samples of size 5, the 



Fjnpirical results for the modified Mann-lVhitney U and the adaptive 
procedure were not obtained for sample size m=n=5 as tabled critical 
values were not available for this case. 
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t statistic was slightly superior to tho other statistics investigat- 
ed with its siq^eriority being more pronounced at the .01 level of sig- 
nificance. In the situations where the sample sizes were unequal, 
the t-test, with one exception, was always the most powerful. Tlie ex- 
ception occurred when the smellier sized sample came from the X-distri- 
bution ajid the change in location parameter was small. Here the 
Mann-ViTiitney U and the normal scores procedure were slightly more po- 
werful than, the t. 

ll\e non^ial scores and Mann-Miitney U tests were very competitive 
with the t. In fact, with a small change in location parameter and 
equal sajni)le sizes of 10 and 20, the Mann-^Vhitney U and the normal scores 
procedures displayed greater empirical power than did the t. Little 
difference was observed in the performance of the normal scores test 
and tlie Mann-Whitney U test. Close agreement in empirical p'^wer values 
of the nonnal scores and Mann -Whitney U was also obsen^ed in other in- 
vestigations (Gibbons, 196^; van der Laan and Oosterhoff, 1967; and 
Neave and Granger, 1968) . 

Based upon the results of this investigation the t, nonnal scores, 
and Mann-Wutney II statistics would be recommended for use in detecting 
a difference betwt . two population moans when sanpling from score 
distributions similar to the models used, Nk^reover, the two rank tests, 
namely nomal scores and M^uin -IVliitney II, involve simpler arithmetic 
and may be preferred over the t if tables of their critical values are 
readily accessible. 



As a quick procedure to test equality of two population means, 
Tukey's test perfomed well for the distributions sampled. This pro- 
cedure compared favorably to the others when sample sizes were small 
as well as for small changes in location parameter. 

The adaptive procedure vsually had an empirical power value 
between those values obtained for the modified Nlann-lVhitney U and 
Nlann-lVhitney U and closer to the modified Mann-Whitney U. This out- 
come was accredited to the criterion value of 2.25 (suggested in a 
personal communication with Randies and Ftogg) used to choose between 
the Mann-Whitney U and the modified Mann-Whitney U. Since the adaptive 
procedure consistently resulted in enpirical power values below those 
of t, S, or U; it would not be recommended for use in detecting dif- 
ferences in two population means when sampling fiom distributions sim- 
ilar to the models of this study. 

The modified Mann-Whitney U did not exhibit high empirical power 
values and thus we do not recommend its use when sampling from distri- 
butions similar to our models. Randies and Hogg (1972) have shown that 
this statistic has high relative power values under a shift alternative 
when sampling from uniform distributions. Even though the uniform dis- 
tribution is light-tailed, it is quite different from the models of 
this study. 

Ver>' little difference in the obtained power functions was 
observed when the null distribution was markedly skewed (0.75) as com- 
pared to when this distribution was more moderately skewed (0.34) the 
noted difference being that S and U had more similar enpirical power 
functions when the null distribution was more moderately skewed. 
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